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1.  Introduction 


Conceptual  clustering  involves  grouping  objects  into  conceptually  similar 
classes  and  producing  a  characterization  of  those  classes.  In  recent  years 
there  has  been  active  research  in  the  area  of  conceptual  clustering.  For  a 
survey  of  several  conceptual  clustering  systems,  see  [2].  All  of  these  systems 
have  focused  on  feature  descriptions  of  the  objects,  such  as  color  or  size, 
to  form  a  coherent  classification.  Only  Stepp  &  Michalski  [8]  have  left  this 
narrow  domain  and  used  structural  description  of  objects,  i.e.,  attributes  of 
object  components  and  the  relationship  among  these  components  to  form 
classes. 


However,  no  system  thus  far  has  used  relational  information  to  classify 
the  set  of  objects.  This  paper  describes  a  system  called  OPUS  imple¬ 
mented  in  Prolog,  which  addresses  this  issue  by  using  relations  over  the  set 
of  objects  (and  not  simply  object  components  as  in  structural  description), 
as  well  as  features  of  objects,  to  form  classes.  We  thus  extend  the  definition 
of  conceptual  clustering  [6]  to  include  relational  information.  I  Accessio 


Given: 


•  A  set  of  objects 

•  A  set  of  features  describing  the  objects 

•  A  set  of  relations  between  the  objects 


•  Criteria  to  evaluate  the  quality  of  a  classification 
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Availability  Codes 
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Find: 


Special 


•  A  hierarchy  of  classes  and  a  characterization  of  the  classes  |  jj  J  | 

Using  relational  information,  the  OPUS  system  eliminates  a  deficiency 
of  previous  conventional  clustering  systems;  unlike  the  other  systems,  this 
system  is  able  to  distinguish  between  objects  which  have  the  same  features 
but  different  relations.  For  example,  in  the  domain  of  genetics,  OPUS 
is  able  to  classify  peas  not  only  in  terms  of  their  color  but  also  in  terms 
of  their  offspring,  effectively  defining  the  class  of  hybrids  and  purebreds. 
Another  deficiency  of  other  conceptual  clustering  systems  is  the  inability 


to  create  new  attributes;  all  attributes  used  to  characterize  objects  have  to 
be  given  to  such  systems.  In  contrast,  0  P US  is  able  to  generate  attributes 
if  it  determines  that  the  current  description  of  the  objects  is  not  sufficient. 
New  attributes  are  defined  as  chunks  formed  from  relations  and  features. 

In  the  next  section  we  describe  the  OPUS  system,  detailing  the  use 
of  relations  to  form  a  classification  and  the  generation  of  new  attributes. 
In  the  third  section  we  give  two  applications  to  illustrate  the  system.  We 
conclude  with  two  proposals  for  extending  this  work. 

2.  The  OPUS  System 

The  input  to  the  OPUS  system  consists  of  the  objects  to  be  classified,  a  set 
of  features  describing  the  objects,  and  a  set  of  relations  over  the  object  set, 
such  as  eat  or  parent.  The  system  generates  a  hierarchical  tree  of  classes, 
each  class  having  a  unique  conceptual  description.  The  system  divides  the 
object  set  into  mutually  exclusive  classes,  and  recursively  divides  the  classes 
until  a  final  partitioning  is  found.  At  first,  features  such  as  color  or  size 
are  used  as  attributes  to  form  classes.  After  the  list  of  current  attributes 
is  exhausted  (i.e.,  all  members  of  a  given  class  have  the  same  value  for  the 
given  features),  new  attributes  are  generated.  Using  these  new  attributes, 
the  clustering  algorithm  refines  the  previously  formed  classes  until  all  mem¬ 
bers  of  the  classes  have  the  same  value  for  all  current  attributes.  OPUS 
continues  the  cycle  of  generating  attributes  and  refining  classes  until  new 
attributes  cannot  be  used  to  further  divide  classes.  OPUS  consists  of  two 
distinct  parts,  the  clustering  algorithm  and  the  attribute  generator,  these 
are  described  in  detail  in  the  following  sections. 

2.1  The  Clustering  Algorithm 

The  OPUS  clustering  scheme  is  based  on  the  RUMMAGE  clustering 
algorithm  [l].  The  goal  of  the  algorithm  is  to  build  a  hierarchical  tree  of 
mutually  exclusive  classes  (clusters)  for  a  given  object  set.  Each  object  of 
the  set  has  associated  attribute/value  pairs  for  a  list  of  attributes.  The 
hierarchy  tree  is  built  in  a  top-down  fashion.  At  each  node  in  the  tree,  the 


algorithm  selects  an  attribute  which  best  partitions  the  object  set  according 
to  some  clustering  criteria. 

After  an  attribute  has  been  selected,  the  object  set  is  divided  into  mu¬ 
tually  exclusive  classes  whose  members  have  the  same  value  for  the  chosen 
attribute.  An  arc  in  the  hierarchy  tree  is  labeled  with  the  value  for  the 
chosen  attribute  at  that  node,  and  any  other  value  for  attributes  which  are 
common  to  all  members  of  that  class.  The  procedure  is  called  recursively 
until  the  classes  cannot  be  further  divided  using  the  given  attributes.  At 
this  point  OPUS  once  again  defines  new  attributes  and  applies  the  clus¬ 
tering  algorithm  to  refine  the  classes.  If  the  new  attributes  cannot  further 
divide  the  classes,  OPUS  decides  that  it  has  determined  the  final  classes 
and  terminates. 

2.1.1  The  selection  of  an  attribute 

Given  an  object  set  and  a  list  of  attributes,  we  want  to  select  that  attribute 
which  best  partitions  the  set  over  the  remaining  attributes.  In  order  to 
measure  the  quality  of  a  proposed  clustering,  OPUS  forms  a  complex  ior 
each  value  of  an  attribute.  A  complex  is  the  logical  implication  for  the 
value  of  an  attribute  over  the  remaining  attributes  [6].  Suppose  that  we 
have  the  object  set  {K,  L,  M,  N,  0}  with  associated  attribute/value  pairs 
for  attributes  A.  B,  and,  C  as  follows1: 

K  =  {  A  =  [o],  B  =  [x],  C  -  [m,n]> 

L  =  {  A  =  [oj,  B  =  [y],  C  -  [m]> 

M  =  {  A  =  [6],  B  =  [x],  C  =  [mj> 

N  =  {  A  =  [6],  B  =  (xj,  C  =  [nj> 

O  =  {  A  =  [«],  B  =  [yj,  C  =  [»]> 

Given  this  data,  the  complexes  for  attribute  A  for  values  [a]  and  [b]  over 
attributes  B  and  C  are: 

(1)  [*]=>  {( B  —  [y]  V  [x])  A  (C  =  [m,n]  V  [m]  V  (nj)} 

(2)  [b]=>  {{B  =[x|)  A  (C  =  [mj  V  (nj)} 

That  is,  if  an  object  has  a  value  of  [6]  for  attribute  A,  it  implies  that  it  has  a 
value  of  [rj  for  attribute  B,  and  a  value  of  [nj  or  [m]  for  attribute  C.  OPUS 

1In  (he  OPUS  tyttem,  values  of  attributes  are  sets.  (See  section  2.2) 


forms  these  complexes  for  all  values  of  all  attributes.  The  complexes  are 
used  to  determine  the  quality  of  an  attribute.  OPUS  uses  two  cluster¬ 
ing  criteria,  the  simplicity  of  the  cluster  description  and  the  inter-cluster 
difference,  which  we  now  discuss. 

2.1.2  The  clustering  criteria 

The  simplicity  criterion  is  used  to  choose  a  partitioning  attribute  which 
forms  a  simple  description,  so  that  it  is  easy  to  characterize  and  differen¬ 
tiate  classes.  A  second  criterion  is  used  to  avoid  the  trivial  and  arbitrary 
classification  which  might  occur  if  the  above  criterion  were  used  alone  [6]; 
the  inter-cluster  difference  measures  the  disjointness  of  two  complexes.  The 
less  values  overlap  among  the  remaining  attributes,  the  higher  this  degree  of 
disjointness  will  be.  A  good  classification  has  simple  class  descriptions  and 
a  high  degree  of  inter-cluster  difference  to  maximize  the  distance  between 
classes. 

The  simplicity  measure  is  a  normalized  value  of  the  number  of  terms 
in  the  complexes  of  an  attribute.  A  complex  consists  of  a  logical  product 
of  selectors.  Each  selector  is  a  list  of  elements  from  the  possible  values  of 
an  attribute  linked  by  internal  disjunction.  The  complexity  of  a  selector 
is  the  number  of  terms  of  the  selector  divided  by  the  number  of  terms  the 
selector  could  have,  i.e.,  the  number  of  domain  elements  for  the  attribute  of 
the  selector.  The  complexity  of  an  attribute  is  the  average  of  the  complexity 
values  of  all  of  the  selectors  of  that  attribute.  The  simplicity  of  an  attribute 
is  defined  to  be  the  negative  of  the  complexity  [6].  The  complexity  of  the 
second  selector  of  complex  (2)  in  our  example  is  |,  because  that  selector  has 
two  elements,  ([nj  and  [m]),  and  there  are  three  possible  values  ([n],[m], 
and  [m,n])  that  attribute  C  can  have.  In  complex  (1),  the  second  selector 
has  a  complexity  value  of  |  =  1.  The  value  of  complexity  for  attribute  A  is 
the  average  of  1, 1,  j,  and  |  which  is  Thus  the  simplicity  for  attribute 

A  is  — — 
a  is  24. 

The  computation  of  the  inter-cluster  difference  of  two  complexes  is  more 
involved.  We  define  a  selector  element  to  be  an  element  of  a  selector — that 
is,  an  element  of  the  domain  of  an  attribute.  (Values  of  attributes  in  the 
OPUS  system  are  sets.)  The  similarity  between  two  selector  elements,  e\ 


and  ej,  is  defined  to  be  atm(ei,ej)  =  The  maximum  similarity  of 

a  reference  element  t  of  a  selector  S'*  to  selector  Sj  is  max{sim(e,  «*)},  for 
all  tk  €  Sj.  The  value  Pii  is  the  average  of  the  maximum  similarities  of  all 
selector  elements  of  selector  S<  to  selector  Sj.  Now,  the  degree  of  similarity 
of  complex  C*  to  Ci,  denoted  Stm« ,  is  the  average  over  all  Pii ,  where  i  and 
j  are  all  the  selectors  of  identical  attribute  parts.  The  degree  of  difference 
of  complex  C*  to  complex  C*,  denoted  Difki,  is  just  1  —  Stm*|.  Finally,  the 
inter-cluster  difference  degree  of  an  attribute  X  is  the  average  of  all  Difki 
values,  k  ^  /,  where  k  and  l  are  complexes  of  all  values  of  the  attribute  X. 

Referring  again  to  the  example,  we  calculate  the  following  values  for 
the  various  computations  to  calculate  the  inter-cluster  difference  degree 
for  attribute  A: 

For  the  selectors  of  attribute  B,  we  compute  values 


—  2  2 

Pn=max{sim([xj,  [y]),  stm([x],  [x])} 


=max{0, 1}  =  1 


For  the  selectors  of  attribute  C,  we  compute  in  a  similar  manner 


Pij= 
P2l  = 


Thus  we  have  a  degree  of  degree  of  similarity  of  complex  (1)  to  complex 
(2)  of  attribute  A  of  (|  +  |)/2  =  f  and  a  degree  of  similarity  of  complex 
(2)  to  complex  (l)  of  ^  =  1.  Therefore  the  degree  of  differences  are  | 
and  0  respectively.  The  inter-cluster  difference  degree  for  attribute  A  is 

(|+0)/2=J. 

This  computation  of  the  inter-cluster  difference  for  an  attribute  makes 
use  of  the  fact  that,  in  the  OPUS  system,  values  of  attributes  are  partially 
ordered.  That  is,  value  [a, A]  is  further  from  value  [6,c,d]  than  it  is  from 
value  [0,6,0),  and  therefore  stm([o,  6],  [6,  c,  d])  is  less  than  st'm([o,  6],  [a, 6,  c]). 


2|e1  Deal  denotes  the  cardinality  of  the  intersection  of  set  «i  and  set  £3,  while  |«i  Ue3| 
denotes  the  cardinality  of  the  anion  of  the  two  sets. 
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Class  descriptions  should  be  as  distinct  as  possible  to  ensure  classes  with 
different  properties.  Maximizing  the  inter-cluster  difference  will  promote 
such  classes. 

The  idea  of  an  asymmetric  similarity  measure  may  seem  counterintuitive 
at  first.  However,  Tversky  [9]  supports  an  asymmetric  similarity  measure, 
and  he  provides  evidence  that  humans  “tend  to  select  the  more  salient 
stimulus  ...  as  a  referent,  and  the  less  salient  stimulus  ...  as  a  subject.” 
Referring  once  again  to  the  complexes  in  the  example,  any  object  satisfying 
the  conditions  of  complex  (2)  also  satisfies  the  conditions  of  complex  (1), 
but  not  vice  versa.  Therefore  Stmji  has  a  higher  value  than  Sima. 

OPUS  maximizes  a  trade-off  between  the  inter-cluster  difference  and 
the  simplicity  of  a  class  description.  At  each  level  in  the  expanding  hierar¬ 
chy  tree,  a  quality  value  for  each  attribute  is  computed.  This  value  is  the 
sum  of 

u  *  simplicity  +■  v*  inter-cluster  difference 
for  some  user  specified  coefficients  u  and  v.  The  user  can  thus  weigh  the 
importance  of  these  two  criteria.  OPUS  maximizes  the  quality  value  of 
the  attributes  selected  at  each  node  in  the  expanding  tree. 

2.2  Generating  Attributes 

New  attributes  have  to  be  defined  when  current  attributes  are  not  sufficient 
to  distinguish  between  members  of  the  same  class.  New  attributes  are 
chunks  composed  of  relations  and  features.  For  this  purpose,  we  define  a 
complex  relation  r_f  (X.Y.Z)  to  be  the  composition  of  a  relation  r(X,Y) 
and  a  feature  f  (Y.Z) .  For  example  in  the  food  ehain  domain  animals  could 
be  described  by  the  feature  size  and  the  relationship  eat.  Thus  the  relation 
•at(X.Y)  and  the  feature  size(Y.Z)  are  composed  to  form  the  complex 
relation  eat-size (X.Y.Z),  describing  that  X  eats  Y  and  Y  is  of  size  Z.  Note 
that  the  first  and  second  argument  of  a  complex  relation  are  members  of 
the  object  set,  while  the  third  is  a  value  of  the  feature.  Complex  relations 
will  be  used  as  attributes. 

The  value  of  an  attribute  is  defined  as  follows.  Given  a  complex  rela¬ 
tion  r_f  (X.Y.Z),  the  value  of  the  attribute  r_f  for  the  object  X  is  the  set 
{Z,  |  3  Y  9  r _f  (X,Y,Z,)}.  That  is,  the  set  of  all  Z’s,  such  that  rJ.  (X.Y.Z) 
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is  satisfied  for  some  Y.  For  example,  the  value  of  eat.siz*  for  snakes  in  the 
food  chain  domain  is  [small,  medium],  because  eat_size (snakes,  Y, 
small)  is  satisfied  for  Y  bound  to  mice  and  insects,  and  eat_size (snakes , 
Y,  medium)  is  satisfied  for  Y  bound  to  snakes.  Thus,  the  attribute  eat_size 
has  a  value  of  [small,  medium]  for  snakes,  because  snakes  eat  small  and 
medium  sized  animals. 

The  system  is  supplied  with  a  small  set  of  binary  relations  such  as  eat 
or  parent.  These  primitive  relations  involve  only  two  objects,  and  there  is 
a  direct  “link”  between  the  two  objects.  In  order  to  define  more  involved 
attributes,  relations  consisting  of  several  primitive  relations  are  formed.  We 
define  a  level  n  relation  as  a  relation  using  n  primitive  relations  between 
two  objects.  A  primitive  relation  is  a  relation  supplied  to  the  system  or 
the  inverse  of  that  relation.  The  relation  eaten(X.Y)  describes  the  level 
one  relation  eaten,  meaning  X  is  being  eaten  by  Y,  while  eat.eat(X.Z) 
describes  the  level  two  relationship  of  X  eats  some  Y  and  Y  eats  Z.  Relations 
are  defined  in  increasing  levels  of  order,  starting  at  level  one.  Now,  a 
level  n  attribute  is  defined  from  a  complex  relation  composed  of  a  level 
n  relation  and  an  existing  feature.  Each  time  new  attributes  have  to  be 
defined  the  current  level  k  is  increased  and  level  k+1  relations  are  defined. 
These  level  k+1  relations  are  composed  with  features  to  define  complex 
relations  and  thus  level  k+1  attributes.  Relations  are  not  directly  used  in 
the  clustering  process,  but  rather  used  to  define  attributes.  Only  attributes 
are  used  to  cluster  objects.  Thus,  objects  are  first  classified  based  upon  their 
features,  then  based  upon  attributes  with  increasing  complexity.  If  at  any 
time  new  relations  cannot  define  attributes  which  refine  classes,  the  system 
terminates  having  reached  a  final  classification. 

At  each  level  k,  new  level  k  relations  are  defined.  A  level  k—l  relation  is 
composed  with  a  level  one  relation  to  form  a  level  k  relation.  All  inverses  of 
relations  are  defined.  To  limit  the  combinatorial  explosion  of  the  number 
of  possible  relations  which  cam  be  defined  at  each  level,  only  a  limited 
number  of  the  k—l  relations  are  considered  to  define  new  relations.  Only 
the  relations  which  defined  attributes  used  to  refine  classes  at  level  k—l  are 
used  at  the  next  level  to  define  new  relations. 

The  chunks  formed  differ  from  the  concepts  defined  in  the  MARVIN 
system  [7].  In  that  system,  concept  formation  is  data-driven.  Examples 
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are  presented  and  MARVIN  generalizes  examples  into  concepts  by  asking 
questions.  In  contrast,  chunks  in  the  OPUS  system  are  formed  in  a  model- 
driven  fashion.  Higher  level  relations  consist  of  several  primitive  relations, 
and  these  are  tested  against  the  input  data.  If  no  object  satisfies  the  rela¬ 
tion,  that  relation  is  discarded;  otherwise,  it  is  used  to  define  an  attribute. 

3.  Two  Examples 

OPUS  has  applications  in  any  domain  where  objects  are  described  by  a  set 
of  features  and  a  set  of  binary  relations.  Two  examples  of  such  domains  are 
presented  in  the  following  sections:  the  food  chain  domain  and  the  genetics 
domain. 

3.1  The  Food  Chain  Domain 

In  the  food  chain  domain,  we  characterize  animals  using  two  features,  size 
and  locomotion,  and  relation,  eat.  For  example,  we  describe  songbirds  using 
the  following  facts:  size  (  songbirds,  medium),  locomotion  (songbirds . 
fly),  eat (songbirds,  worms),  eat (songbirds,  insects),  and  eat( 
hawk ,  songbirds) .  All  fourteen  objects  are  characterized  by  the  same  two 
features.  Fifty-one  relational  facts  are  asserted  to  describe  the  relationship 
eat  over  the  objects  set. 

At  first,  OPUS  uses  features  as  attributes  to  classify  the  objects,  size 
has  the  same  simplicity  value  as  locomotion,  but  a  higher  inter-cluster 
difference  value.  Therefore  size  is  chosen  as  the  first  attribute  to  divide 
the  object  set  in  the  hierarchy  tree.  For  example,  a  class  of  medium-sized 
objects  is  created  with  the  following  members:  hawks,  owls,  songbirds,  and 
snakes.  After  the  system  has  used  locomotion  to  refine  classes,  there  are 
no  attributes  left  and  new  attributes  have  to  be  defined. 

In  response  to  that  OPUS  defines  all  possible  level  one  relations.  The 
following  complex  relations  and  attributes  are  formed:  eat_size, 
•at-locomotion,  eaten-sizs,  eaten_locomotion.  The  first  two  describe 
the  size  and  locomotion  of  animals  eaten  by  an  object,  the  latter  two  de¬ 
scribe  the  size  and  locomotion  of  the  animals  that  eat  that  object.  These 
four  attributes  are  used  to  divide  the  existing  classes.  For  example,  the 


class  of  medium-sized  flying  objects  is  refined  using  the  attribute  eat_size. 
Hawks  and  owls  eat  medium  and  small  animals,  while  songbirds  only  eat 
small  animals. 

After  the  current  attributes  have  been  used  to  refine  the  classes,  there 
are  only  two  classes  with  more  than  one  object  left,  the  class  of  frogs  and 
toads,  and  the  class  of  hawks  and  owls.  The  level  two  relations  eat_eat , 
eat  .eaten,  eaten.eat  and  their  inverses  are  formed,  and  concatenated 
with  the  features  to  define  level  two  attributes.  Frogs  and  toads  have 
the  same  values  for  these  new  attributes,  therefore  that  class  is  not  re¬ 
fined.  However  hawks  and  owls  have  different  values  for  the  attribute 
eat.eaten.size, namely  [large .  medium]  and  [large,  medium,  small]. 
That  is,  hawks  eat  animals  which  are  eaten  by  large  and  medium  sized  an¬ 
imals,  while  owls  eat  animals  which  are  eaten  by  large,  medium,  and  small 
animals.  Thus,  the  attribute  eat.eaten.size  is  used  to  divide  that  class. 
The  next  level  relations  cannot  define  attributes  which  refine  the  class  of 
frogs  and  toads,  so  the  system  terminates.  The  resulting  hierarchy  tree  is 
shown  in  Figure  1. 

3.2  The  Genetics  Domain 

Let  us  now  consider  an  example  from  the  field  of  genetics.  The  clustering 
problem  in  genetics  consists  of  classifying  objects  based  not  only  on  their 
observable  features,  but  also  on  features  of  their  descendants  and  their 
ancestors.  Gregor  Mendel,  the  founding  father  of  genetics,  observed  that 
when  a  yellow  garden  pea  was  crossed  with  a  green  garden  pea  the  resulting 
offspring  pea  was  yellow  [4j.  When  he  self-fertilized  that  pea,  it  produced 
both  yellow  and  green  offspring.  After  he  continued  to  self-fertilize  peas, 
he  discovered  that  some  of  the  yellow  peas  had  yellow  and  green  offspring 
while  other  yellow  peas  only  produced  yellow  offspring.  Green  peas  consis¬ 
tently  had  green  offspring.  Mendel  thus  hypothesized  the  class  of  purcbrtds, 
peas  which  produce  offspring  with  exactly  the  same  features  as  the  parent, 
and  the  class  of  hybrids,  peas  which  produce  some  offspring  with  the  same 
features  and  other  offspring  with  features  different  from  their  parent. 

When  OPUS  is  provided  with  information  about  the  color  of  each  pea 
and  the  offspring  each  pea  produces,  it  defines  the  classes  of  hybrids  and 
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purebreds.  At  first,  the  feature  color  is  used  as  an  attribute  to  distin¬ 
guish  yellow  and  green  peas.  Next,  the  attributes  offspring-color  and 
parent-color  are  defined.  For  the  class  of  yellow  peas,  the  inter-cluster 
difference  and  the  simplicity  value  for  these  attributes  are  equal.  In  the 
running  system  parent-color  was  picked  to  refine  the  class  of  yellow  peas. 
At  this  point  all  peas  are  correctly  identified  as  either  a  yellow  or  green 
purebred  or  a  (yellow)  hybrid.  Furthermore,  the  characterization  of  these 
classes  corresponds  with  Mendel’s  characterization.  For  example,  the  class 
of  green  purebreds  only  has  green  offspring,  while  the  class  of  hybrids  con¬ 
tains  only  yellow  peas  which  have  both  yellow  and  green  offspring.  OPUS 
continues  to  refine  the  classes — distinguishing,  for  example,  between  pure¬ 
breds  with  hybrids  as  parents  and  purebreds  with  purebreds  as  parents. 

Mendel  continued  his  experiments,  crossing  peas  with  two  different 
traits,  color  and  shape.  He  observed  nine  different  classes,  all  having  dif¬ 
ferent  dominant  and  recessive  traits.  We  supplied  the  OPUS  system  with 
the  color  and  shape  of  each  pea  and  asserted  the  relations  over  the  ob¬ 
ject  set.  Again  OPUS  correctly  defined  and  characterized  as  intermediate 
classes  all  nine  classes  which  Mendel  identified  as  the  various  hybrids  and 
purebreds.  For  example,  OPUS  defines  two  different  classes  of  round  green 
peas;  one  class  has  members  which  only  have  round  green  peas  as  offspring, 
while  the  other  class  has  members  which  produce  round  green  and  wrinkled 
green  offspring. 

4.  Summary  and  Further  Research 

In  this  paper,  we  presented  a  conceptual  clustering  system  which  uses  rela¬ 
tions  over  the  object  set  to  define  a  hierarchy  of  classes.  Using  the  relational 
information,  this  system  is  able  to  find  classifications  not  possible  with  con¬ 
ventional  methods  of  conceptual  clustering.  We  presented  an  example  from 
the  domain  of  genetics  where  the  system  is  able  to  form  the  classes  of  hy¬ 
brids  and  purebreds.  Furthermore,  we  introduced  a  method  to  define  new 
attributes  used  in  the  classification  process. 

This  work  can  be  extended  in  two  ways.  It  is  unrealistic  to  assume  that 
all  the  information  describing  objects  is  available  initially.  An  incremental 
version  of  OPUS  would  build  the  hierarchy  tree  using  partial  informa- 
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tion,  predicting  missing  properties  of  objects  as  well  as  missing  objects.  As 
more  data  becomes  available,  predictions  can  either  be  confirmed,  in  which 
case  the  belief  in  other  similar  predictions  is  reinforced,  or  they  can  be 
disconfirmed,  in  which  case  a  revision  of  classes  occurs. 

The  present  version  of  OPUS  can  handle  only  binary  relations.  An 
extension  of  the  system  working  with  n-ary  relations  would  greatly  enhance 
its  power.  For  example,  in  the  domain  of  chemistry,  some  compounds  are 
classified  as  acids,  alkalis  and  salts  depending  on  (among  other  properties) 
their  reactive  behavior.  For  example,  alkalis  react  with  acids  to  form  salts. 
Using  ternary  relations,  these  classes  could  be  formed  in  a  way  similar  to 
GLAUBER  [3],  yet  in  a  more  efficient  manner.  At  the  moment,  we  are 
actively  engaged  in  working  in  these  directions. 
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